“Civilizations advance not by the technology they know about, but by the technology they don’t have to know about.” – Anonymous proverb
Through the Case Studies (Chapter 4 & 5) and the discussion in Chapter 6, a clear understanding of what people want from direct and indirect data relations (RQ1 & RQ2) has been established. In this chapter, we turn our attention from theory to practice, from what is needed to what is possible. Specifically, this chapter will return to the overall research question “What relationship do people need with their personal data, and how might that be achieved?” and look specifically at its second clause. This chapter describes practical approaches for future research and innovation, in a way that is deliberately broad and shallow, from the perspective that it is more useful to introduce a wide range of applicable ideas than to go into great detail on just a few. This is not intended to form a complete or exhaustive roadmap; it is a snapshot of ongoing work, identified challenges and known opportunities, forming an anthology of reference material for designers and innovators in this space. These ideas are illustrated through real world insights and activities from the four industrial and academic research projects I was part of, and from the work of other innovators and activists. This chapter also builds upon the theoretical insights from the Case Studies in order to inform the design of future research, innovation and policy as to how the better Human Data Relations conceived in this thesis thus far might be achieved.
The approach this chapter takes is to name and illustrate what challenges and opportunities are relevant when attempting to bring about changes in the world that would bring people closer to the six HDR wants that this thesis has uncovered. There are many aspects to such a wide-reaching objective: technical, design, commercial, legal, moral, social and political and this chapter does not pretend to cover them all nor to be formal empirical research. However I have been fortunate to have undertaken, during the same time period as I have been working on this PhD but outside of the research, direct embedded work in personal data interaction related projects (3.4.3) in academic and industrial research that directly contribute to the question of how to bring about better human data relations in practice. As a result, some of these challenges and opportunities herein are described in greater detail than others, corresponding only to my proximity and depth of engagement with those ideas rather than their relative merit, complexity or impact potential.
In section 7.1.1 the external activities I undertook are described; they form a primary point of reference for insights and illustrations shared in this chapter, as they have allowed me to learn enough to provide a useful overview and highlight many important and evolving areas where different actors are trying to bring about changes that often align well to the six data wants uncovered in the previous chapters.
In section 7.1.2, I explain some important context about the nature of the ideas presented in this chapter and how to attribute them fairly.
In section 7.1.3, I introduce some additional background on Theories of Change (ToC), which are used as a framing device for structuring the insights described in the main body of this chapter into a series of different possible trajectories for change.
In section 7.1.4, I consider the researcher-turned-activist stance that drives this chapter, framing the pursuit of better HDR as a recursive public.
In section 7.2, to provide deeper context for what follows, the concept of HDR is expanded to identify some additional insights into how people relate to data, and an important dichotomy of two distinct drivers that motivate people’s needs for better relations with their data.
Section 7.3 and 7.4 form the main body of this chapter, with obstacles and insights being detailed in section 7.3 and specific opportunities into how better Human Data Relations can be pursued in practice described in 7.4. 7.4 is structured using the ToC framing described in 7.1.3, as a series of named opportunities fitting into each trajectory of change.
Section 7.5 concludes the thesis, summarising the change trajectories presented in 7.4, the thesis’ contributions as a whole, and answering the overall research question.
[TODO Move 3.4.3 etc. to here and remove all refs to 3.4.3]
The majority of examples and learnings shared in this chapter come from my participation as an expert researcher and designer in two industrial research projects:
In addition, my participation as an interface designer and front-end software developer in the following two academic research projects contributes secondarily to this chapter:
While this thesis is my own original work, and many ideas presented in this chapter are fully original, some of the specific details, theories and ideas presented in this chapter arose or were developed or augmented through my close collaboration, discussion and ideation with other researchers, including:
Due to these collaborations and the ongoing and parallel nature of many of these projects to my PhD research, it is impossible to precisely delineate the origin of each idea or insight. In practice, ideas from my developing thesis and own thinking informed the projects’ trajectories and thinking, and vice-versa. These ideas would not have emerged in this form without my participation, so they are not the sole intellectual property of others, but equally I would not have reached the same conclusions alone, so the ideas are not solely my own either. All diagrams and illustrations were produced by me, except where specified, and the overall synthesis and framing presented in this chapter is my own original work. Where this chapter includes material from the four projects, that material is either already public, or permission has been obtained from the corresponding project teams.
To provide a structure for cataloguing the insights conveyed by this chapter, I use a Theory of Change (ToC) framing. ToC is a set of methodologies is commonly used by philanthropists, educators and those trying to improve the lives of disadvantaged populations (Brest, 2010); the theories can be used in different ways including planning, participatory design and field evaluation of the effectiveness of new initiatives. There are many different implementations, but common to most of them is a focus on explicitly mapping out desired outcomes (Taplin and Clark, 2012) with a clear focus on who is acting and whether the change being brought about is a change in action, or a change in thinking (Es, Guijt and Vogel, 2015). In this chapter, ToC theory will be used in a very limited way, not as a methodology but simply to provide a structural frame for proposed changes, as described below. Using ToC to perform evaluation of the effectiveness of proposed change approaches in action in society would be well beyond the scope of this thesis. Nonetheless, this frame is a useful way to map out the different approaches to changing the world in pursuit of the ideal of better HDR.
Figure 29 illustrates the aspects of ToC thinking that section 7.4 will use as its frame. Specifically, desired changes can be broken down into:
At the same time, desired changes can be broken down into:
These two splits produce four dimensions of change, and form four quadrants representing different types of change, which are shown in Figure 29 and described here:
Key to ToC thinking is the idea that making changes in one quadrant can stimulate change in others; for example, collective learning about data attitudes and practices, such as the research conducted in this PhD, (lower left quadrant) could inform the design of new technologies, interfaces or processes (lower right quadrant), which if built could make new structures available to have an impact on improving individual-provider relationships (upper-right quadrant). The changes to those relationships could then in turn lead to individuals thinking and feeling differently (upper left quadrant), for example feeling more empowered or having greater awareness of data practices.
Before engaging with the practicalities of pursuing change, it is valuable to revisit the stance from which we approach this change. As outlined in 3.2, the research of this PhD has been grounded in participatory action research and experience-centred design; by using a Digital Civics (Vlachokyriakos et al., 2016) frame to gain deep understanding of people’s needs and the ways those needs are not fully met, we can see how the world needs to change. Section 3.2 already outlined that we can consider such research as political, seeking to correct an imbalance in the world. In this chapter, we look beyond identifying what change is needed, and step into the role of activist, exploring how individuals and groups can actually change the world they inhabit.
In doing so, we can consider ourselves (those who pursue better Human Data Relations, or HDR reformers as a shorthand) as a recursive public (Kelty, 2008; ‘Recursive Public (Discussion Page)’, no date), albeit a nascent one. This is a term originating in the free software movement to describe a “collective, independent of other forms of constituted power, capable of speaking to existing forms of power through the production of actually existing alternatives”. This term captures the idea that through various means at our disposal: participatory research, experience-centred design, engineering software prototypes, exertion of legal rights, and efforts to raise public awareness, we seek to modify the systems and practices we live within in pursuit of our goals. This collective around better Human Data Relations does not yet exist as a named and identifiable public (Le Dantec, 2016) but its members congregate around emergent collectives in interconnected and overlapping spaces, most notably the MyData community (MyData, 2017) and its members, but also research and activism agendas including but not limited to: digital rights (‘Open rights group: Who we are’, no date), gig economy worker rights (Kirven, 2018), privacy by design (Cavoukian, 2010), data justice (Taylor, 2017; Crivellaro et al., 2019), critical algorithm studies (Gillespie and Seaver, 2016), humane technology (Harris, 2013) and explainable AI (‘Explainable AI: Making machines understandable for humans’, no date).
Whether these disparate groups coalesce into a single identifiable public remains to be seen, and so too whether the term this thesis offers of Human Data Relations is sufficient to capture that public (at least, it provides a descriptive umbrella term). Nonetheless, the breadth of research and innovation and activism happening in this space validates both the need and the desire for such a recursive public around better HDR to exist. Therefore, this chapter takes an unashamedly critical view of the status quo, favouring disruptive societal changes that would further the objectives of better Human Data Relations and providing actionable approaches that will be of use to the members of this public. The chapter asks, “How can we change the world into the one we want?”
Chapter 6 established six ‘wants’ in HDR: visible, understandable and usable data; process transparency, individual oversight and decision-making involvement. At a simplistic level therefore ‘better’ HDR can be achieved by working to improve upon those six aspects of data interaction. However, as this section will explain, HDR can be conceptually split into two distinct motives, to which those six wants apply differently, therefore it is useful to develop the concept of HDR further. As background understanding for this duality of motivation, it is first necessary to examine more closely what role data plays in people’s lives.
In the modern world, where almost anything can be encoded as data, and given many previously analogue objects and activities now have digital equivalents, the concept of data has become broad and hard to pin down. Underlying Human Data Relations is to explain what roles data can play in people’s lives – what it is to people. Through the Case Studies, external work and my prior learning, I have so far identified 8 distinct lenses to consider how people might relate to it. These are modelled in Table 15.
| Way of thinking about data | Explanation & Implications |
|---|---|
| Data as property | Data can be considered as a possession. This highlights issues of ownership, responsibility, liability and theft. |
| Data as a source of information about you | Knowing that data contains encoded assertions about you and can be used to derive further conjectures enables thinking about how it might be exploited by others, but also how you can explore and use it yourself for reflection, asking questions, self-improvement and planning. It invites consideration of the right to access, data protection, and issues around accuracy, fairness and misinterpretation / misuse. |
| Data as part of oneself | A photo or recording of you, or a typed note or search that popped into your head could be deeply personal. This lens on data highlights issues around emotional attachment/impact, privacy, and ethics. |
| Data as memory | Data can be considered as an augmentation to one’s memory, a digital record of your life. This lens facilitates design thinking around search and recall, browsing, summarising, cognitive offloading, significance/relevance, and the personal value of data. |
| Data as creative work | Some of the data we produce (e.g. writing, videos, images) can be considered as an artistic creation. This lens enables thinking about attribution, derivation, copying, legacy and cultural value to others. |
| Data as new information about the world | Data created by others can inform us about previously unknown occurrences in our immediate digital life or the wider world. This lens is useful for thinking about discovery, recommendations, bias, censorship, filter bubbles, and who controls the information sources we use, as well as who will see and interpret data that we generate and what effects our data has on others. |
| Data as currency | Many data-centric services require data to be sacrificed in exchange for access to functionality, and some businesses now explicitly enable you to sell your own data. This lens highlights that data can be thought of as a tradable asset, and invites consideration of issues of data’s worth, individual privacy, exploitation and loss of control. |
| Data as a medium for thinking, communicating and expression | Some people collect and organise data into curated collections, or use it to convey facts and ideas, to persuade or to evoke an emotional impact. This lens is useful to consider data uses such as lists, annotation, curation, editing, remixing, visualisation and producing different views of data for different audiences. |
When considering HDR, it is important to recognise that people may think of their personal data through any or all of these ‘lenses’ [Karger et al. (2005);2.2.2] at any given time, and any process or system design involving data interaction should take these into account.
Looking across this set of lenses, it is possible to identify four specific roles that data can serve:
To unpack HDR further, it is important to highlight the difference between humans relating to data, and humans relating to information. Human Data Interaction (HDI) concerns the way people interact with data. Mortier et al. (Mortier et al., 2013, 2014) defined the field of HDI without distinguishing data (the digital artifact stored on computer) from information (the facts or assertions that said data can provide when interpreted). This is an important distinction. The parallel field of Human Information Interaction (HII) originated in library sciences, and considers the way humans relate to information without regard to the technologies involved (Marchionini, 2008). William Jones et al. called for a new sub-field of HII in an HCI context2, observing that it is important to include a focus on information interaction because HCI can “unduly focus attention on the computer when, for most people, the computer is a means to an end – the effective use of information” (Jones et al., 2006). DIKW theory (see 2.1) highlights that interpretation of data to obtain information is a discrete activity. This was borne out in the findings of Case Study Two, where it became clear that participants have distinct needs from data, and from information (5.4.3.2). Access to data and information is critical to both understanding and useability, as detailed in section 6.1.2 and 6.1.3.
Drawing on this theory, we can see then that in considering Human Data Relations, there are in fact three distinct artifacts to consider:
By making this distinction between the two types of information which people might interact with, and considering the six wants in Chapter 6, it becomes clear that there are two very different reasons why people might want better HDR:
to acquire information about one’s data, so that one might exert control over and make informed choices about where the data is held and how it is used, in order to be treated fairly and gain more control over the use of one’s personal data. This is Personal Data Ecosystem Control (PDEC).
to acquire information about oneself, so that one might gain insights into one’s own behaviour and gain personal benefits from those insights or them to make changes in one’s life. This is Life Information Utilisation (LIU).
The two distinct processes that individuals might go through in pursuit of these motives are exemplified in Figure 30. PDEC is a process of holding organisations to account over and managing what happens to personal data, often regardless of what it means, whereas LIU is more concerned with what the data means and its inherent value as encoded life information, regardless of where it is stored and how it is used3. This novel way of modelling the motivations for data interaction were first proposed in my 2021 workshop paper (Bowyer, 2021).
Life Information Utilisation is a superset of Self Informatics (SI), as defined in 2.2.3. It includes all purposes relating to self-monitoring and self-improvement through data, but also includes all other uses of personal data including creative expression, evidence gathering, nostalgia, keeping, and sharing. Many of these desires were expressed in Case Study Two (see Table 12 in 5.3.3), and also hinted at in the Early Help context (4.4.1). While the existence of digitally-encoded information clearly unlocks new possibilities, LIU has existed in some form throughout human civilisation, as seen through analogue processes such as storytelling, journalling, scrapbooking, arts and crafts.
In the LIU context, the most important wants to focus on improving are data understandability (6.1.2) and data useability13 (6.1.3), which relate closely to the HDI concepts of legibility and agency respectively.
Unlike LIU, Personal Data Ecosystem Control is an individual need that is new; arising as a result of the emergence of the data-centric world (2.1, 2.2.4). Only when organisations began to collect and store facts about people as a substitute for direct communication and involvement did it become necessary. The more data is collected about individuals, and the more parties collect and share that data, the greater the need for individuals to learn about that data so that they might influence its use (or risk their lives being affected in unexpected or potentially unfair ways). PDEC is a direct response to the power imbalance between data holders and individuals that the World Economic Forum described in 2014 [2.1.2;Hoffman (2014)].
In the PDEC context, multiple data wants are important: visible data and transparent processes, as well as individual oversight and involvement. For simplicity, the former two wants can be referred to collectively as “ecosystem transparency”, and the latter two as “ecosystem negotiability” (drawing on the HDI concept of negotiability), and these terms will be used below.
In this section I will describe the high level obstacles to better HDR, in four sections. These are arranged into six groupings. The first four groupings correspond to the six wants identified in Chapter 6. Two additional groupings have been included to cover more general human and technical challenges that affect all endeavours in this space:
People struggle to relate to data. It is not relatable because it is complex, not presented as meaningful information, and not easily interpretable as information. They lack tools to gain insights. To overcome this obstacle, more work is needed to make data relatable and to provide tools that can deliver valuable meaning and insights.
When data is transformed into information that can be related back to moments, people, places or relationships in people’s lives, it becomes instantly relatable. [from BBC: Data becomes meaningful when people are able to associate it with the real substance of their lives - people, places, organisations, causes or topics they care about. Therefore, the more associations you can find in data the more valuable it is.]
We can consider the different types of information in people’s lives:
We need to model life information, not data.
Every individual’s personal data is scattered across multiple providers, devices, apps, held by hundreds of third parties. The complexity of a modern day digital life is unmanageable and overwhelming. People are inevitably ignorant of much of their data and its use. This can lead to resignation and apathy. To overcome this obstacle, approaches must be identified that recognise the scattered, complex reality of each individual’s personal data ecosystem and begin to make it visible and understandable.
No matter how understandable the data itself is, it is also critical that people can acccess information about their data ecosystem. Without this, there will always be aspects of their data that are beyond their awareness or beyond the reach of what they can access, control or manage. Many tools today do not recognise this, and build for a world that does not exist. It is important that people have tools that allow them to interact with multiple providers and data sources across their digital life.
Almost all data is constrained in some way, limiting its useability. It may be held by a particular provider and inaccessible. It may be stored in a format which is hard to use or change. It may only be visible after a delay. It may be unchangeable. To overcome this obstacle, we need to find ways to extract data from its current constraints and to remove some of these technical or temporal limitations.
Even once an individual has gained possession or access to the relevant parts of their personal data, it can be extremely hard to use. This partly comes from a lack of malleability - the ability to break it down, look at it from different perspectives, reconstitute it in different ways. Put simply, people need to be able interrogate their data - ask questions of it. This requires more than just an ability to view visual representations of data, but an ability to interact with the data and produce new views and insights that can help to answer specific questions. Making some of the PIM and SI capabilities described in 2.2.2 and 2.2.3 can help to address this, but more capabilities can be made available and are needed to fully overcome this obstacle.
Many computer operating systems and interfaces today treat files as the basic material that an individual can manipulate. To truly empower users to make use of their data, we need to move to a model where pieces of life information – facts (or assertions) – can be created, deleted, moved, grouped, annotated, copied, shared, modified, labelled, organised, separated or otherwise manipulated instead. So far, people access data within products. But what they need is a platform, not a product. We need an information operating system.
The first and most obvious barrier that individuals face in managing a complex personal data ecosystem is that, to a great degree, they cannot see it. For example, it is very easy to allow a handful of communication and social media apps access to your address book or contact list, and before you know it you have created a complex and unmanageable network of connections that silently sync and propogate your adddresses and phone numbers across the Internet. And there are deeper layers which are not even slightly visible to users: networks of data brokers, advertisers and digital cookie companies exchanging user identifiers, activity data and personal information about you while you browse or use apps. As Chapter 5 showed, even though people have been granted new rights to access their data and information about provider data sharing activity, the ability to effectively execute those rights to build up a meaningful picture of your personal data ecosystem is severely limited by inconsistent, incomplete or unclear responses. The strong negative practical impacts of today’s complex digital lives were already described in section 2.2.4; managing the complexity is an overwhelming, unmanageable task that even personal data experts are not fully able to get a handle on. The ability to provide a user with ecosystem transparency is hindered by the complexity and multiplicity of the data relationships they have been encouraged to set up, and by a lack of tools to provide a meaningful, or indeed any, view of those relationships. A further aspect to this obstacle is that no individual or organisation has the ability to see the whole of a user’s ecosystem, and there is little commercial motive to try and solve this problem, as every provider focuses just on their own apps, websites and services.
From this complexity an additional obstacle becomes evident. There is scant attention to information about your data. Even where data access rights are executed (or data is shared via human means such as in Chapter 4), the attention is on the data itself: what it says. Chapter 5 shows that some of the most desired information was not the data itself, but how it is used and shared and what is inferred from it, yet this was rarely forthcoming. There are many pieces of information that can be quantified about an individual’s data, as illustrated in Figure X, which I created during my internship at BBC R&D:
[ EXPLAIN ASPECTS]
To provide users with meaningful transparency, many of these aspects will need to be tracked and visualised; not an easy task given the complexity and the potential to overwhelm a user, but nonethless a vital first step on the road to giving individuals the ability to have oversight of their personal data ecosystem and take action within it.
[ADD REFERENCE BACK TO 2.2.2 METADATA]
A number of researchers have independently identified the importance of keeping the history of a piece of data with it. Without context, data loses meaning (a phenomenon witnessed in Case Study Two – see 5.4.3.1). The idea that what has happened to not just an individual but to a piece of data over time is important is a key part of the thinking behind temporal PIM systems, from Lifestreams (Freeman and Gelernter, 1996) to activity streams (Hart-Davidson, Zachry and Spinuzzi, 2012) (see 2.2.2). William Odom, Siân Lindley and colleagues proposed the idea of file biographies, which view the lifetime of a file as something that should remain connected and traversal in order to understand the context of the file at its different interaction points. Significant research in this space has been undertaken by Professors Mike Martin and Rob Wilson at Northumbria University, formerly Newcastle University, who express the idea of data with provenance; in other words that data must carry with it the details of why it exists, how it came to be, and what has happened to it since its inception, and that provenance must be communicated alongside any visualisation of the data, if it is to be fully understood (ie. for its context []). This plays into the ideas of Gitelman, Neff and others, that data is not neutral and in fact is inherently biased, since it was created for a specific purpose with a specific agenda in mind (Gitelman, 2013; Neff, 2013). [ADD MORE DETAIL FROM MIKE MARTIN PAPER AND EMAIL HERE]. While it is not a solution in its own right, it is clear that data with provenance is very likely to be a critical and valuable part of any design that aims to help individuals with managing to get an overview of their complex and invisible personal data ecosystems.
In the pursuit of individual oversight and greater involvement, the power imbalance between individuals and data holders (2.1.2) becomes most clear. While the Internet itself initially held the promise to be a great leveller and to empower individuals, this potential has largely been suppressed. Data is owned and controlled by service providers, who also design and control the interfaces, apps, websites and devices through which individuals access those services, controlling what (if any) of the data stored behind the scenes, and of the internal processes that use that data, is visible, and how such data and processes are represented. In Jasperson et al.’s detailed metatriangulation review of types of power that affect technology systems (Jasperson et al., 2002) we can identify a number of specific types of power that clearly are in effect in today’s digital data-centric service provider context:
[ADD TYPES OF POWER FROM JASPERSON WITH CONTEXTUAL EXPLANATIONS] [structual power, resource control, centralisation etc]
A helpful analogy for the relationship between provider and user can be seen in the design of Panopticon: A style of prison architecture designed to elevate the power of the prison guards to observe all the prisoners easily at any time and to diminish the ability of prisoners to operate in privacy or to see those in authority. Jeremy Bentham [REF], drawing on the philosophy of Foucault [REF], makes clear that such design is political, and shows that power can be enforced by the environment. This is a useful mental scaffold to keep in mind; as explained below [REF], we can think of today’s digital landscape as similarly power-enforcing. Code is law [ADD REF Lessig], and interfaces limit what individuals can do. By holding data behind interfaces shaped to serve their own interests, the landscape is controlled by the data holders. [UPDATE THIS BASED ON OTHER WRITING ABOUT PANOPTICON]
Sitra’s #digipower investigation [REF], of which I was project leader for Hestia.ai, was a successor to my Case Study Two, but worked with high profile politicians and European influencers and added additional technical audit techniques. Its focus was not on the individual experience of data access, but on using those experiences and acquired datasets to better understand the data ecosystem. Through this research, a model was produced to understand the ways in which service providers (and in particular the larger ecosystem-level platform providers such as Google and Facebook) exert power over individuals and smaller organisations. This model is reproduced in Figure X:
[TODO: redo
this diagram]
[ADD EXPLANATION AND REFERENCE TO THE PIDOUX REPORT]
Through this landscape it is clear that the most powerful data holders exert huge influence over the digital landscape, in terms of what is knowable and what is do-able. Individuals or activists’ abilities to balance the landscape are hindered by the fact that they are operating in a landscape that the incumbent platform and service providers effectively control.
A key mechanism to highlight here is that the accumulation of information is implicitly and objectively a form of power. This is consistent with participants’ observations in 5.4.4.1 that data holding and limiting access to it is a source of power. In terms of this being an obstacle, we can therefore see that as long as current platforms and service providers are free to collect so much personal information, the information landscape will remain imbalanced and individuals will not be able to acquire ecosystem negotiability.
Today’s digital landscape is fractured[REF Splinternet]; myriad providers vie to pull users into service relationships or connected ecosystems that will encourage a flow of money and attention to their own products and services, most evident from companies such as Apple, Amazon, Facebook, Google and Microsoft (the so-called ‘big five’) that have multiple touchpoints into people’s lives through different devices, apps and services. We can think of these different providers’ sub-Internets as walled gardens or silos [REF]. Commercial motives encourage them to get users to spend time in their own proprietary spaces (so that resultant ad revenue can be captured) and in order to maintain subscription revenues it is in providers’ interests to make it hard for individuals to leave or switch providers. In effect, providers build for a world that does not exist, where every individual is imagined to only interact with that single company’s interfaces. There is little incentive to open up the ecosystem when the free flow of information and of users might result in loss of income for the company in question. Users with negotiability would be more able to leave. And this also encourages keeping users in the dark (5.4.2). The less agency and negotiability that users have, the more freedom the provider has to do exactly what they want with their data. In this context, users are, as Lawrence Lessig wrote, ‘pathetic dots’ [ADD REF]. Thus service providers continue to build proprietary, incompatible silos.
But it is not only commercial motives that encourage insular attitudes to personal data and user service provision. In the SILVER project [ADD REF] meetings with local authorities and care providers revealed deep organisational and technical barriers within the public sector, with for example health organisations being typically unwilling to share health data with social care services, but also with different councils, community services and charities typically operating separate IT systems, each attempting to construct their own digital pictures within their own databases and very little operability. The problems of this technical reality are explored further in 4.1.2. From what we have observed, the introduction of GDPR and similar regulations has made this problem worse not better, as organisations and departments become increasingly paranoid about storing or sharing data they should not, or about the risks of acting upon data without sufficient consent. We learned of practices such as the sharing of information between care organisations verbally by telephone so that no digital trail was left.
It is clear that throughout society, there is a trend towards organisations being reluctant to work together around people’s data, inclined towards collecting their own databases and not sharing them.
Also mention resistance to change
As a result of the practices and motives described above, the last decade has seen much reduction in individuals’ agency. When software was sold in a box, manufacturers competed based upon which product would let the user take home the greatest range of features and capabilities. New releases with new features drove new product sales. But in the cloud computing era, a smaller set of core features done well is sufficient to guarantee an ongoing subscription revenue from a user. Cost savings in development and support costs can be made by reducing feature sets. The relentless pursuit of increased profits and further cost saving sees products lose, not gain, features. Interfaces are reshaped to serve businesses’ interests first and foremost. As described in 2.3.5, the primary concern is about making user behaviours constrained, predictable and profitable, rather than meeting their needs or providing maximal value. One of the most revealing examples is seen in the case of Facebook. Users used to be free to consume their friends’ posts in other clients via RSS feeds. These were removed, forcing users to use only Facebook’s interfaces, where their eyeballs can be monetized (Twitter closed its APIs too to a great degree, killing off many third party readers). On Facebook users used to have the ability to view the latest updates from a particular list of friends or of news pages. These features too were removed, presumably to increase monetization through the main feed. The ‘Friends’ page on Facebook currently shows a list of recommended new friends; to access your current friend list requires an extra click. Encouraging users to grow their networks is prioritised over user convenience.
Companies change their practices to limit users’ agency (and their own accountability to customers) too. For example, Facebook recently announced they will no longer collect historical location data from users (though they will still use location information). This makes it harder for users to see how their data has been used. Tiktok announced they will rely on legitimate interest rather than consent when it comes to using users’ activity data to personalise the app experience, removing users’ ability to withdraw consent to such use. Unchecked, it is clear that trends to reduce users’ agency and further providers’ interests will continue, creating another obstacle to be tackled.
Earlier in this thesis the concept of a data self has been introduced (4.4.1, 4.4.3, 6.3). We know from both the preliminary study with families (Bowyer et al., 2018) and Case Study Two that data serves as a proxy for direct human involvement of the served individual(s). Put simply, service providers try to minimise interaction with people, by maximising their usage of data to represent people. We are viewed through the distorted lens of our data selves. Despite the inherent challenge of representing people fairly and accurately in data [Bowyer et al. (2018); 4.4.1; 5.4.4.1], this is the default modus operandi for service provision today. This therefore represents a key obstacle to ecosystem negotiability today: how can individuals be given the ability to influence and shape the data self that providers will use to understand them and make decisions?
While in the previous four subsections it was possible to identify obstacles relating to specific HDR wants, there are also some readily identifiable obstacles that will affect all our endeavours to improve HDR. Obstacles relating to human challenges are described in this section, and technical challenges are addressed in the following section, 7.3.6.
In considering the recommendations of Case Study One (shared data interaction between the state and the individual) and of Case Study Two (new human-centric data practices by service providers), and in exploring possible new human-centric system and interface designs through my work with BBC R&D, it is evident that even if new human-centric types of computer system or service interaction practices can be created, we cannot assume that people will be inclined to use them. Today, data is overwhelming, complex, and ‘sounds boring’. There is no denying that currently, engaging with one’s personal data economy to any degree more than that of passive consumer, is hard work. People routinely accept data sacrifice, click through T&Cs and cookie banners and are unwilling (or in some cases lack sufficient technical literacy, comprehension or skill) to do the work of asserting control over their digital lives. There is not a clear demand for holistic and novel ways of managing your digital life and exerting agency and negotiability over it. This can be seen as an obstacle that affects all HDR improvement approaches we see, and indeed is why many companies in the emergent PDE economy (2.3.4) struggle to find a business model. But this should not deter disruptive innovation nor does it indicate that such offerings would not be useful. As Henry Ford famously said, “If I had asked people what they wanted, they would have said faster horses.” Nonetheless, it is a clear overarching obstacle to overcome.
Through work at the BBC R&D exploring how to better connect people with their data, it became clear that there is a way to combat such indifference and apathy of users. It emerges from the realisation that the way people find value in data is to connect it their lives. The more that people see relatable life information and can imagine ways to harness that information in their everyday life, the more motivated they will be. [include the three concentric circles diagram a bit like the one Rhianne used]
As an example, myself and BBC colleague Jasmine Cox imagined focusing on address books and contact lists as a strong relatable starting point that could easily generate a user demand. Many people face a complexity they cannot easily manage when it comes to the automated syncing and sharing of potentially sensitive contact information between devices, apps and providers, and developing human-centric personal information management capabilities to bring that messy situation under control would offer a clear and tangible benefit to users.
Another example that is helpful to consider is my the example from my 2011 article: that of a vacation, as shown in Figure X (Bowyer, 2011). Today, all the information around such a holiday is scattered into multiple systems - emails, online provider bookings, chat logs, cloud synced photos, web browser bookmarks, smartphone location logs, etc. It is not hard to imagine that a system that was able to bring all related information about that vacation together in one central place could deliver huge value to users and be very compelling. Such context-targeted human-centric offerings can have a much greater chance of generating interest and impact than offerings that merely allow you to “organise your data” or some other abstract phrasing.
As with any public offering of a product or service, it is important to start with identifying a problem or need, and to demonstrate a potential tool or solution that can help. In particular, there is a need to let people do new things that they could not do before. This has been identified as a key ingredient of user empowerment (Meschtscherjakov, Wilfinger and Tscheligi, 2014; Schneider et al., 2018). This became a driving influence for design thinking on the BBC R&D Cornmarket project. It is not enough to believe that “If you build it, they will come.”
Obstacle 8 (7.3.4.3) already touched on the issues around different companies developing different standalone walled garden or silo user experiences, from a sociotechnical or systemic standpoint. But there is a very specific technical problem that must be acknowledged across all HDR improvement approaches, and that is that it is very difficult to build technical systems that connect and exchange data with each other. This was witnessed first hand by our development team on the SILVER health data interface project [REF] which endeavoured to build a bridge making health data available to Early Help support workers. Not only are there a lack of standards, with each organisation using their own databases and formats for storing data, but often the programming interfaces (APIs) that would be needed to interface between different systems (sometimes legacy systems) do not exist, are insufficient. Furthermore, there can be issues around licensing and consent when data passes from one domain to another. Data sharing agreements must be established, especially in the public sector which is by its nature more liable to scrutiny and accountability. But at an abstract level the technical obstacle, the problem is one that has always faced the tech industry, which is that there often is no universally agreed way to represent important concepts - in this case human-centric information concepts such as events, social media posts, website visits, location history information, app activity, etc. And any entity that does create a standard then faces the challenge of trying to persuade others that their standard is the best one to use. In general, standards work best when established by non-commercial industrial standards bodies (for example the World Wide Web Consortium (W3C) or International Organisation for Standardization (ISO) and then mandated through policy such as European Union law. Such standards much be established with input from industry experts.
Following on from the previous obstacle, but a subtly different point, is that it is technically difficult for machines to handle human information. Without deliberate coding, software can only understand streams of binary data as files or datasets, and does not understand what people, places, events or entities the facts within the data relate to. Therefore, it is necessary to consider how algorithms and systems can be designed to include an understanding of the semantics (meaning) of the information within the files and data records they handle. For example, the data record representing a post on Twitter looks entirely different to the data record representing a post on Facebook. No algorithm can recognise or unify these disparate pieces of data as two instances of the same semantic concept until it the specifics of the data format can be mapped back to a common semantic abstraction of a “social media post”. [find meaning in user’s data]
This leads to the next insight: that to build systems and interfaces that are able to deal in human concepts and represent the elements of everyday life requires building systems that store semantic context and semantic associations, not just raw bundles of data. This is advocated by the Web’s inventor Tim Berners-Lee in his vision of a Semantic Web (Berners-Lee, Hendler and Lassila, 2001) and by proponents of networked PIM systems (2.2.2). There is a need to develop standard ways to digitally model facts and assertions about users’ lives, so that those disparate pieces of data can be unified, connected, correlated and compared. Sizable industries have built up around Content Analytics and Enterprise Content Management. Through the capture of metadata at the point of data recording, and through subsequent programmatic analysis of stored data, as illustrated in Figure X (Bowyer, 2011), we can begin to teach computers what the data we store represent. Machine learning technologies and Artificial Intelligence have pushed machine understanding of human words, images and content to impressive levels in recent years and such technologies can certainly be helpful, but in fact at the core what we are talking about here is somemthing much simpler than AI; It is simply about labelling datapoints in as many different ways as possible so that those datapoints can be associatively retrieved from many different angles, and providing humans with ways to amend incorrect labels and to reclassify data or apply new semantic associations.
Now, having established some of the key obstacles to improving HDR, we can move to considering what opportunities exist to pursue the HDR wants and to overcome those obstacles. This section will first introduce a framing for those opportunities, and then illustrate specific opportunities in detail.
In Figure X, the ToC frame introduced above in 7.1.3 / Figure 29 is used as a canvas upon which to position the different trajectories for changes that could improve HDR. By enumerating the possible types of activity that can bring about change, each of the four quadrants’s core change trajectory can be named, as shown in purple, forming the backbone of the roadmap for improving HDR, which can be summarised thus:
[TODO: do we need a summary diagram here?] [Figure X: SUMMARY OF OPPORTUNITIES]
Research such as that conducted in this PhD is an example of the collective, internal focused activity that can be done in this quadrant to further the goals of better HDR: Groups of people working together using a variety of techniques such as participatory co-design, interview-based quantititative studies, design prototype evaluation and other HCI techniques can gain new understandings of individual needs and experiences in HDR. However rather than mapping out such possibilities, this section will focus on more novel approaches that go beyond traditional HCI research towards activities that are potentially more socially impactful.
Helps with: Ecosystem transparency
Through the emergence of new tracking tools such as TrackerControl and Apple’s App Activity Reports, individuals can observe the actual behaviour of the apps they use, providing a new means to identify potential data sharing destinations, to assess whether providers are meeting their promises, and to uncover new questions that can be asked of providers using data access rights. By collectively examining and comparing such data, it is possible to begin to map out the data ecosystem, as was done in the digipower investigation [REF].
Helps with: Ecosystem transparency
This sort of combination of individual observations is just one of many ways in which individuals can, through working together, discover more information about data usages and practices. Collectives offer a powerful means to examine how providers categorise users and process their data, for example by comparing field values to understand the range of possible values or inferences a data holder might have stored, or by comparing variations in information presentations, data rights handling or customer service experiences to reverse engineer provider practices.
Helps with: Data Understanding, Ecosystem Transparency
Given the complexity of today’s digital landscape and the forces that hinder better HDR, there is scope for an industry to develop around ‘data understanding’ services. This can encompass everything from self-service tools people can use to gain insights over their data (such as those provided by Ethi or Hestia.ai), to workshops helping consumer organisations, journalists, regulators, lawyers and other interested parties to collectively gain understanding and value from data so that they might better achieve their goals, as well as serving a general educational purpose for example in schools. This industry is beginning to emerge, but faces challenges in funding, scalability, governance, and credibility and should be supported (REF Pidoux et al).
Given the shifting power balance of the information landscape outlined in 7.3.4.1, 7.3.4.2 and 7.3.4.4, it is clear that there is an opportunity, perhaps a need, for HDR reformers to carry out activities that monitor and publicise any changes that providers make that reduce individuals’ HDR capabilities. Having identified such changes it is then easier for those HDR reformers, and indeed the wider public, to fight to protect and maintain current capabilities, as we see in the Right to Repair movement [REF] or the Net Neutrality movement [REF]. [mention also the idea of pushing to make sure what should be done, is done - e.g. in GDPR returns]
Compounding the impacts of reducing agency described in 7.3.4.4 is the ‘dumbing down’ of technology. Apple, for example, encourages users to consider technology as ‘magical’, rather than as understandable tools to be harnessed and understood; Such thinking is manifested in their hardware design too: phones that cannot be opened up, expanded or repaired [REF]; the removal of accessory ports, disk drives, and headphone jacks [REF]; increased controls over what can be installed on users’ hard drives and which areas of disk can be modified [REF]. These changes simplify the technology and bring it to a more mainstream audience, something that the iPhone and iPad must be given due credit for - but it happens at the cost of reducing user agency. Companies like Apple increasingly encourage users to think of technology as a black box, which you cannot and should not look inside.
[TODO rephrase this para as a better introduction to Seams] An important concept to understand in this space is that of seams. In ‘The Politics of Seams’ Storni outlines that current designs are incompatible with empowerment-in-use, and highlights the role of design seams (and their removal) as being a key determiner of user power [REF]. He says that the designer passes some power to the user through their design, but also, that users should be able to take some power on their own terms (repurposing etc). He talks [says what] about the problems of technology as magic/design as conjuring:
“Magical design prioritises pleasing and surprising a passive user who can only use the solution as authorised” – Cristiano Storni (Storni, 2014)
Therefore part of what we need to be doing is (a) highlight and (b) removing seams/creating new seams between disconnected parts….
Groups of HDR reformers can combine development skills, innovation and disruptive design approaches to find and publicise new ways to circumvent providers’ efforts to control and limit their users’ agency, as illustrated by the use of web scrapers and web augmentation approaches to try and obtain information or functionality from providers that would otherwise be inaccessible. [also mention device tenancy (zeynep) and firefox containers/taking back power in the browser/browser as seam (reference Goffe et al)]
Collectives can also exert external influence in the adjacent ‘Defend & Create’ quadrant by using their learnings through data to demand change, as seen in the case of Uber drivers working together to obtain data on algorithmic judgements that affect their work and using that as evidence to help them demand fairer working conditions [REF]. [individuals collectively pressuring to improve GDPR responses, drive data portal improvements, etc] [traditional means e.g. press, public campaigns but also new ways e.g. mass GDPR or targeted GDPR] [mention dehaye’s pressure, leading to FB Off Site Activity, my success with Spotify] [noyb as example, also Privacy International, Bits of Freedom,] [cite examples from Mahieu papers][Mahieu, Asghari and Van Eeten (2018);mahieu2020a;Mahieu and Ausloos (2020)] [mention pooldata, data unions]
Helps with: Data Understanding, Data Useability, Ecosystem Transparency, General Human Challenges Affected by: Lack of interoperability (7.3.6.1/12)
As others have identified, one of the most promising models for giving people a new and improved relationship with their data is to create a place where one’s personal data can be stored and aggregated in one place, a personal data locker (see 2.3.4). This prospect was explored through the BBC R&D Cornmarket project during my internship, as detailed in 3.4.3.3 [OR MOVE SOME OF THAT TEXT HERE]. As alluded to in the quote opening Chapter 1, people’s data is scattered (see also 2.2.4 and (Abiteboul, André and Kaplan, 2015)), and simply providing the ability to bring data from sources together in one place can improve people’s understanding of their data and its ecosystem. This integration requires technical standardisation but also [BRING IN SOME TEXT FROM BBC BLOG ARTICLE] [REF previous appetite for PDS https://journals.sagepub.com/doi/full/10.1177/2053951720935616]
talk about the capacity to unify
[add quote from BBC research where people liked the concept of a place for your data]
Helps with: Life Information as Material, Limited machine understanding of data
As part of the BBC R&D Cornmarket project, I carried extensive information modelling and design work with colleagues on how today’s common types of data might be modelled as life information in order to help with the stated goals.
First we need to consider what a piece of data is, which is different from what data format it is or what semantic concept it represents.
It is possible to use some abstraction of commonalities to group together pieces of data that can perform a similar role:
It can also be useful to model the different attributes of data in terms of what can be done to it.
We can imagine a simplified model of presenting information to users:
Helps with: engagement/efforts
Key idea to share: that the system should try and automatically associate data to entities. also: calendar/contact as start point. conjecture and assertion to reduce effort. learning, correcting, like an assistant [ref] world2vec as an exmample of the sophistication of what is being done and harnessed for provider purposes[REF]
[citation: diagram by Alex Peysakhovic, from CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec, a guest lecture at Georgia Tech by Facebook AI research engineer Ledell Wu, given Feb 18th 2020. https://www.cc.gatech.edu/classes/AY2020/cs7643_spring/slides/L13_Embedding_world2vec_final_vers ion.pdf , archived at https://web.archive.org/web/20211018015836/https://www.cc.gatech.edu/classes/AY2020/cs7643_spri ng/slides/L13_Embedding_world2vec_final_version.pdf]
one possible flow of how to identify data
show how attributes types etc (ref back to earlier diagrams) can be detected:
show how different types of entity can be identified. important to establish associations.
(add credit for cluedo board)
Life partitioning would allow conceptually the user to navigate information according to what semantic concepts it is or relates to.
(add credit for cluedo
board)
A mockup of how this might look in a user interface (ignore first frame for now)
A mockup of a life interface dashboard (by Alex Ballantyne)
(to do: make this fit on a portrait
page)
Key idea: detecting the ecosystem. Example: the subscription detector.
Key idea: Rivers of flowing info. Including people.
It is important to think about the capabilities people will have (expand on and map this back to all the PIM calls in 2.2.2)
refer back to dashboard mockup & data needs to be interrogable and malleable. What if there aren’t visualisations for your questions. But also, visualisations raise questions
Here are a series of data cards we used in a user research activity at BBC. (initially conceived by me, condensed and reworked by myself in collaboration with Chris Gameson)
Key idea: verbs. types of question. asking tools not just predetermined insights.
Key idea: self profiling (a la BBC) but also more generally, the “I want a bicycle” VRM idea (unless already covered)
Key point: regulation of landscape -> new moves DSA maybe also some ref back to end C5 and to the GDPR Guidelines
Key idea: generalised types of data (refer back to relevant model above) [reference to Solid Shapes etc]
[POSSIBLY CUT THIS ONE]
Key idea: Data as bringing different people together , ref living lab. also use of cards in my research and at BBC (and ref Urquhart?)
key idea: tackling resistance reducing liability, improved consent . less waste cost on broadcast advertising (could express it as a development of all the ad personalisation today). selling the benefits.
Key point: define it, distinguish it from technical skills/literacy as well as from numbercrunching Literacy
Key point: empowering individuals as investigators. Can help them with tools or learning programs.
[reiterate the answer to the question - the key 4 roles, 3 capabilities and N approaches needed for better human data relations] [how to group together the approaches]
[clarify the contribution of the thesis, with backreferences - 2 case studies, RQ answers, and the HDR roadmap]
[highlight future value/societal implications of the work]
Diagram used here unchanged from Hivos ToC Guidelines (Es, Guijt and Vogel, 2015, p. p90) under a CC-BY-NC-SA 3.0 license, whose authors state that this diagram was adapted from earlier work by Wilber (1996), Keystone (2008) and Retolaza (2010, 2012).↩︎
The group of HCI researchers involved in this panel were (with the exception of Raya Fidel) seemingly unaware of the existing HII field in library sciences as they positioned the publication as a call for a ‘new field’.↩︎
Of course, there is some overlap; the reason that organisations hold data is so that they can interpret it (usually algorithmically) to inform decision-making. In this way, organisations could be seen to be doing LIU of service users’ lives for their own benefit. From a human-centric perspective, this grey area is situated as part of PDEC, as from the individual perspective, how organisations understand you through information will inform decisions that affect your life. Thus, this can be considered part of the reason why one might want to exert control over use of your data, rather than being part of exploiting data to gain self-insights and personal benefits.↩︎
The illustrated processes assume reliance on existing data access processes such as GDPR, where the only access is through provision of a copy of one’s data. This is in fact, not ideal, as it creates divergent versions and will quickly become out-of-sync, however for the sake of simplicity this inefficiency is ignored here. Improvements upon this approach are explored in [INSERT REF]↩︎